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STOCHASTIC KINETIC MODELS: DYNAMIC INDEPENDENCE, 
MODULARITY AND GRAPHS^ 

By Clive G. Bowsher 

University of Cambridge 

The dynamic properties and independence structure of stochastic 
kinetic models (SKMs) are analyzed. An SKM is a highly multivari- 
ate jump process used to model chemical reaction networks, partic- 
ularly those in biochemical and cellular systems. We identify SKM 
subprocesses with the corresponding counting processes and propose 
a directed, cyclic graph (the kinetic independence graph or KIG) 
that encodes the local independence structure of their conditional 
intensities. Given a partition [A, D, B] of the vertices, the graphical 
separation A J- B\D in the undirected KIG has an intuitive chemical 
interpretation and implies that A is locally independent of B given 
Add. It is proved that this separation also results in global indepen- 
dence of the internal histories of A and B conditional on a history 
of the jumps in D which, under conditions we derive, corresponds to 
the internal history of D. The results enable mathematical definition 
of a modularization of an SKM using its implied dynamics. Graph- 
ical decomposition methods are developed for the identification and 
efficient computation of nested modularizations. Application to an 
SKM of the red blood cell advances understanding of this biochemi- 
cal system. 

1. Introduction and summary. The dynamic properties and conditional 
independence structure of stochastic kinetic models are analyzed using a 
marked point process framework. A stochastic kinetic model or SKM is a 
highly multivariate jump process used to describe chemical reaction net- 
works. SKMs have become particularly important as models of the network 
of interacting biomolecules in a cellular system. The necessity of a stochastic 
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process approach to the dynamics of such biochemical reaction systems is 
now clear [28, 30], with SKMs providing continuous-time, mechanistic de- 
scriptions firmly grounded in chemical kinetic theory and the underlying 
statistical physics. The Gillespie algorithm [9, 10] for simulation of SKMs 
is now an important tool in the science of systems biology. However, there 
are few analytical tools for study of the dynamic properties of SKMs (al- 
though note [1, 12] and [8]), especially when the SKM is of modest or high 
dimension. 

This paper develops what appear to be the first methods for analyzing 
the local and global dynamic independence structure implied by a given 
SKM and shows how these may be used to uncover the modular architec- 
ture of the network at coarser or finer levels of resolution. The required 
information about the parameters of the SKM is modest, and consistent 
with the partial information about these currently available for many bio- 
chemical reaction networks. SKMs are often thought of as continuous-time, 
homogeneous Markov chains having nonfinite state space. However, the fact 
that there are a finite number of possible types of jump of the process — 
corresponding to the different types of possible biochemical reaction in the 
system — allows formulation of both the SKM and its subprocesses as mul- 
tivariate counting processes. This turns out to be a fruitful approach for 
the problems addressed here. In fact, the Markov property is not needed 
for the results and methods of the paper. The main contributions may be 
summarized as follows. 

Graphical models for SKMs and dynamic molecular networks are intro- 
duced. These kinetic independence graphs (KIGs) are directed, cyclic graphs 
whose vertices are the different types (or species) of biomolecule in the sys- 
tem. The KIG encodes local independences that result from a lack of depen- 
dence of the conditional intensity of a subprocess on the internal history of 
some of the species. 

Given a partition [A,B,D] of the vertices, the graphical separation A 1. 
B\D in the undirected version of the KIG has an intuitive chemical inter- 
pretation and implies A is locally (or "instantaneously") independent of B 
given Add (and B locally independent of A given BUD). It is proved that 
this separation also results in conditional independence, over any finite time 
interval (0,t], of the internal histories of A and B conditional on a history 
of the jumps in D. Conditions under which this history corresponds to the 
internal history of D are derived and are easily checked computationally. 
Such a conditional independence is termed a global (as opposed to local) 
dynamic independence here. 

The new results enable mathematical definition of a modularization of an 
SKM using its implied dynamics. Graphical decomposition methods are de- 
veloped for the identification of nested modularizations that allow the extent 
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of coarse-graining to be varied and provide computationally efficient algo- 
rithms for large SKMs. Junction tree representations are shown to provide 
a useful tool for visualizing, summarizing and manipulating the modular- 
izations. Applying the techniques of the paper to an SKM that represents 
detailed empirical knowledge of the metabolic network of the human red 
blood cell yields new insight into the biological organization and dynamics 
of this cellular system. 

Graphical models and their associated analytical and computational meth- 
ods allow the modularization of large, complex models into smaller compo- 
nents and provide a particularly effective means of representing and ana- 
lyzing conditional independence relationships [3, 21]. Certain graphical ap- 
proaches are now used quite extensively in computational biology and have 
also been readily assimilated by the wider biological scientific community, 
which has long found diagrammatic representations of reaction schemes use- 
ful [15]. However, rigorous graphical representations of biochemical networks 
as dynamic processes — that is graphical models in the statistical sense — do 
not appear to have been considered previously. 

Indeed, graphical models for continuous time stochastic processes in gen- 
eral are in an early stage of development. Didelez [5, 6] introduced graphs 
based on the local independence structure of conditional intensities for fi- 
nite state, composable Markov processes and multivariate point processes, 
respectively; [22] is an earlier contribution, also for finite state Markov pro- 
cesses. SKMs require new methods since interest is in dynamic indepen- 
dences between groups of species rather than the counting processes for 
the different types of reaction per se. Furthermore, the Markov process for 
species concentrations implied by the SKM neither has finite state space, 
nor is it composable for most SKMs of interest (see Section 3). 

In practice, the SKM is constructed from a large list of the biochemical 
reactions that comprise the network under study. This list, or "network 
reconstruction," is usually compiled using extensive experimental evidence 
in the literature on the component parts of the system and their molecular 
interactions [26]. Indeed, the approaches of molecular biology and genetics, 
including genome sequencing, have already proved remarkably successful in 
providing life scientists with a very extensive "parts list" for biology. Systems 
biology is an increasingly influential, interdisciplinary approach that aims 
to describe mathematically the stochastic dynamic behavior of the whole 
system as an emergent property of the network of interacting biomolecules 
[30]. 

A principal challenge is thus to map from fine level descriptions such as 
reaction lists and their implied SKMs to higher level, coarse-grained descrip- 
tions of the dynamic properties. Related is the increasingly held view that 
biochemical reaction networks are modular, that is their architecture can be 
decomposed into units that perform "nearly independently" [18], and that 
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identifying such modules is a crucial step in the endeavor to understand and, 
ultimately, to selectively control cellular systems. However, it is recognized 
that rigorous, mathematical definition and identification of modularizations 
for biochemical networks is difficult, especially from a dynamic perspective 
(see [29], Chapter 3). As a result, such modularization techniques have been 
slow to develop, and there seems to be no prior work allowing for stochas- 
tic and non-steady state dynamics. The dynamic independence results and 
associated graphical methods developed here provide an effective means of 
addressing these problems. Broadly speaking, the paper also illustrates the 
utility of a statistical and probabilistic approach to the dynamics of biologi- 
cal systems which, despite their stochastic nature, have hitherto more often 
received the attention of physical scientists. 

The structure of the paper is as follows. Section 2.1 introduces SKMs 
and reaction networks in a manner requiring no previous background in sys- 
tems biology or biochemistry. Section 2.2 defines an SKM as a marked point 
process and provides a formal construction using the well-known Gillespie 
algorithm as a point of departure. Section 2.3 then shows how to accommo- 
date subprocesses of the SKM in a counting process framework and discusses 
their conditional intensities and internal histories (natural filtrations) . Sec- 
tion 3 introduces the kinetic independence graphs, or KIGs, and examines 
local independence and graphical separation in the undirected KIG. Section 

4 then relates these to global conditional independence of species histories 
in Theorems 4.4 and 4.5, which are central to the paper. Rigorous proofs 
of these theorems are quite involved and are given as Appendix A. Section 

5 develops graphical decomposition methods and associated theory for the 
identification of modularizations of SKMs, while Section 6 applies the tech- 
niques of the paper to the SKM of the human red blood cell. Section 7 
highlights some directions for future research. 

2. SKMs and counting processes. 

2.1. Introducing the SKM and reaction networks. A stochastic kinetic 
model is a continuous-time jump process modeling the state of a chemical 
system, X{t) = [Xi{t), . . . ,Xn{t)]', where Xi{t) is interpreted as the nonneg- 
ative, integer number of molecules of type i present at time t. The set of dif- 
ferent types of molecule or the species set is given by V := {1, . . . ,n}. There 
are a finite number of possible types of jump in X{t) that may take place, cor- 
responding to the different types of possible reaction, m € Ai := {1, . . . , M}. 
It is particularly useful for our purposes to view an SKM as a marked point 
process or MPP in which the points or "events" correspond to the jump 
times of the process X{t). Mathematically, a particular reaction can then be 
identified with an element of the finite mark space and each mark indicates 
the type of jump associated with the corresponding jump time. 
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An SKM is denoted here by {Ts, Zs}s>i, where Tg is the sth jump time. 
The mark Zs G {Sm\m € A4} is the value of the jump and is interpreted 
as the changes in the number of molecules of each species. The matrix 
S := [Si, S2, ■ ■ ■ , Sm] is usually known as the stoichiometric matrix. Any two 
columns of S are taken to be nonequal; hence, there is a bijection between 
the mark space and A4. A formal construction of an SKM is given below in 
Section 2.2 but it is helpful at this stage to note the following linear equation 
determining the dynamic evolution of X{t): 

X{t)=X{0) + SN{t), t>0, 

where N{t) = [Ni{t), . . . , NMit)]' is the M-variate counting process associ- 
ated with the marked point process {Ts, Zs}s>i- Thus, Nm{t) is interpreted 
as counting the number of reactions of type m during (0,t]. Denote by 
/"/^ := a{N{s); < s <t) the internal history of the entire process and by 
J-"™ := a{Nm{s); < s <t) the internal history of the mth counting process. 
The probability law of N{t), and hence that of X(t), is determined by what 
are known as the J^^-conditional intensities, [Xm{t);'m S M.]. 

The conditional intensity concept is important for an understanding of the 
paper. At time t, each intensity Xmit) is interpreted as the local (or instan- 
taneous) rate of reaction m, conditional on the internal history of the entire 
process J-^ . Confining attention to a finite interval of time T, provided 
that N{t) has finite expectation Mt^T (and that [Xm{t)'-,t € 7^ is bounded 
by an integrable random variable), each intensity is a local rate of reaction 
in exactly the chemical sense — that is, \m{i+) = lim/iio ^\h~^ {Nm{t + h) — 
^m{t)}\J-^], the conditionally expected number of reactions of type m per 
unit time in the limit as h goes to zero. Of course, the intensities are them- 
selves random variables (r.v.'s) since the evolution of up to time t is itself 
a stochastic process, hence the appearance of the conditional expectation. A 
technical subtlety is that Xmif) is defined to have sample paths that are left- 
continuous (with limits from the right), compared to the right-continuous 
sample paths of X{t). A heuristic chemical interpretation is that if a jump 
in X takes place at t, then future jumps are (locally) determined by the 
intensity evaluated "immediately after" t. 

A basic familiarity with the chemical representation and interpretation of 
reactions is helpful in what follows (see also [30] for an accessible introduc- 
tion). Each reaction m G 7W has the chemical representation 

(2.1) E /^i^i' 

jG-R[m] ieP[m] 

which is read as follows: when reaction m takes place, Oj molecules of type 
i are consumed for each i in the subset R[m] C V, and /3j molecules of type 
j are produced for each j in the subset P[m] C V. The species i?[m] are 
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called the reactants (or inputs) of the reaction m, and the species P[m] are 
called the products (or outputs) of m. The integer coefficients {/3j}] 
are known as the stoichiometries of the reaction. If a species k is a reactant 
but not a product, then its corresponding entry in the stoichiometric matrix 
S (i.e., the change in the level of k caused by reaction m) is given by = 
—ak- Alternatively, if species k is a product but not a reactant, then — 
/3fc. There is no assumption that R[m] nP[m] = 0, and if k is both a product 
and a reactant then Skm = Pk — oik- A common situation in this case is 
h = CKfc, that is k acts as a "catalyst," increasing the rate of the reaction 
but not itself being "changed" by the reaction — that is, not itself being 
overall consumed or produced when m takes place. Formally, the sets R[rn\ 
and P[m] are defined by allowing zero stoichiometries and writing the mth 
reaction as X^iGV*^*^* Yl,j(^v f^i-^i- Then R[m] := {i G V|a.j > 0} and 



In systems biology, a living cell is often viewed as a network of interacting 
biomolecules of different types, with n and M both large (and often M > n). 
The interaction is selective — only species that are reactants for some reaction 
m can together react to give products. Each reaction involves only a few 
species, so the cardinality of i?[m] U P[m] is small. Certain reactions are 
"coupled" in that a product of one reaction is also a reactant of another 
reaction. From a stochastic process perspective, the specification of the list of 
component reactions as in (2.1) for all m G implies dependences between 
the levels (or concentrations) of the different biomolecules. 

As a simple but nonetheless biochemically meaningful illustration, con- 
sider the following example of an SKM. 

Example 2.1. Consider the SKM with the 5 different species V = {P, 
R, g, P2, gP2} and the 6 reactions 



The gene (g) is responsible for the production of molecules of protein (P) 
via the intermediate (mRNA) species (R). In this simplified representation, 
g and R act as simple catalysts in the reactions trc ("transcription") and 
trl ("translation"), respectively. The third reaction d consists of the binding 
of 2 molecules of P (the sole reactant) to form the new molecule P2 (the 
sole product). The fourth reaction rd is the reverse of the third. The fifth 
reaction sets up a "negative feedback cycle" whereby the production of P is 
negatively self-regulated by the binding of P2 to g to form the distinct species 
gP2- Genes bound in this way to P2 are not then available to participate in 
the trc reaction, thus preventing over-production of the protein. We shall 
return later to the same example. 



P[m] :={jeV\Pj>0}. 



g g + R 
P2 ^'''^ 2P, 



R R + P, 
g + P2^'gP2, 



2P-^'^ P2, 
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2.2. Defining and constructing the SKM. The Gillespie stochastic simu- 
lation algorithm [9, 10] has become an important tool in biological science 
for studying biochemical and cellular systems. Given its familiarity in math- 
ematical and computational biology, the following construction of an SKM 
as a marked point process takes as its point of departure the conditional 
distributions employed in the Gillespie algorithm. For our purposes, the al- 
gorithm is usefully viewed as outputting a realization of the MPP {Tg^Zg}, 
from which the resultant process X{t) is easily constructed as in (2.4) be- 
low. Readers less concerned with formal constructions and already familiar 
with stochastic kinetics may proceed safely to Section 2.3 after noting Def- 
inition 2.1 of an SKM and (2.6) for the conditional reaction intensities (or 
"hazards"). 

Denote the numbers of molecules of all species at time Ts by := Zf_i + 
Zg {s = 1,2,...). Let Z^ be the initial, deterministic state of the system, 
and define Tq := 0. We write the c-field generated by the first r points and 
marks as '■= (^iTr,Zr; r = 1, . . . , s). Also let -^T := c(Ts+i, T^., Z,.; r = 
1, . . . , s), where the (s + l)th mark is excluded from the generating collection 
of random variables. 

Now introduce the important propensity (or reaction rate) function for 
the mth reaction, XmiZ^), where Am:No — > [0,oo) is continuous. The con- 
ditional distributions implied by stochastic kinetic theory [11] and employed 
in the Gillespie algorithm are given by 



(2.2) 



P(r,+i > t\TTj = exp|-(t - r,) ^ A„,(Zf )|, 

I m=l J 



t>Ts,s = 0,l,..., 



that is the waiting time to the next occurrence of a jump (reaction) is 
exponentially distributed with parameter "^^=1 "^rniZ^); and 

M 

(2.3) P(Z,+i = Sm\TT^^^_ ) = A^(Zf ) / A,n(^f ), 

m=l 

which gives the mark (or jump) distribution. Note that both the waiting 
time and mark distributions depend only on Z^ , the levels of the species 
present following the sth reaction. The pure jump process X(t) is given 
straightforwardly, for t > 0, by 

(2.4) ^W:=^max{.:T.<t}> X{0):=Z^, 

it being well known that X(t) is a time- homogeneous Markov chain under 
P. 
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It turns out to offer significant advantages and simplification to adopt a 
MPP framework for the problems addressed in the paper. An SKM is thus 
defined here directly in terms of the MPP {Ts,Zs} and its corresponding 
counting processes. It is implicit in our definition of a MPP that Tg < Tg+i 
whenever Tg < oo {s > 1). Thus, reactions occur instantaneously and no 
two reactions ever have identical occurrence times in continuous time. The 
physical interpretation is that reaction durations are negligible and may be 
ignored. The random variables Tg are (0, oo]-valued, with the interpreta- 
tion that less than s reactions take place during the time interval [0, oo) 
if Tg = oo. The flexibility gained will not be needed routinely, but may be 
useful for cellular systems that can enter an inactive or quiescent state. The 
stability condition lim^^^oo^s = oo a.s. is imposed, which is equivalent to the 
statement that only finitely many reactions occur in any finite time interval 
(sometimes known as nonexplosivity) . 

Definition 2.1. A stochastic kinetic model (SKM) is the MPP 
[{Tg, Zs}s>i, S, P] with mark space given by the columns of S, {Smlm G -M}, 
where no 2 columns of S are equal; and where the probability measure P is 
such that (2.2) holds P-a.s. on {Tg < oo}, (2.3) holds P-a.s. on {Tg^i < oo}, 
and lims-i-oo^s = oo P-a.s. 

Equivalently, the SKM may be denoted by the corresponding multivariate 
counting process (MVCP), [iV,5', P], where N := [Nm.{t);m G M]t>o, and 
Nm{t) = X^<j>i ^{Tg < t)l{Zg = Sm) counts the number of reactions of type 
m that occur during [0,t]. 

Note that by definition the reaction counting processes {Nm{t);m £ M} 
have no jump times in common. If the stability condition — )■ oo a.s. holds, 
there exists for any propensity functions Xm{Xt-) — see [16], Theorem 1.7, 
page 56 — a unique or canonical SKM satisfying Definition 2.1 on {Q,J^), 
where Q is the space of M-variate counting process paths ([16], Definition 
1.2, page 53), N is the identity map from — )■ J], and T = a{N{t);t > 0). 

It follows from (2.2) and (2.3) that the propensity functions give the J-^- 
conditional intensity process X{t) in the MVCP sense (see [16], Definition 
2.7), that is, X{t) = [\m{Xt-)]m<=M ■ When N{t) has finite expectation > 0, 
this means that [Nm{t) — /q Am(s)(is] is an J^^^-martingale Vm. That the 
intensities satisfy 

lim ^P(iV^(t + h)- Nm{t) = 1| ) = Xm{Xt), m€M, 

(2.5) 

liin-P(Ar(t + /i)-iV(f)>l|J-f) = 0, 

where N{t) := XlmeX ^® ^ principle conclusion of the argu- 

ments of stochastic kinetic theory [11]. The assumptions of the theory are 
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that the system is spatiahy homogeneous (or "well-stirred"), confined to a 
fixed volume and held at constant temperature. Under these assumptions, 
(2.2) and (2.3) have a firm physico-chemical basis [11]. 

It plays a significant role in what follows that the theory implies that the 
^-"/^-intensities, Xm{t), have the form 

(2.6) Xm{t)=Cmgr,^{X''^'^\t-)}, 

where > is a deterministic ("rate") constant, and gm{'} > is a con- 
tinuous function depending only on the levels of the reactants R[m]. 

2.3. SKM subprocesses — histories and intensities. For any subset of 
molecular species j4 C V, let the vector process {X^{t)} := {Xi{t);i G A} 
denote the corresponding subprocess of X. We identify X^ with its MVCP, 
analogously to the treatment of X = above. For A CV, consider the 
subset of reactions A(A) C Ji4 that change (the level of) A, that is, A(A) := 
{m e A4 : ^ 0}, where is the subvector of Sm corresponding to the 
elements of A. One can identify with the MPP, {T^^,Zf}, where each 
jump time corresponds to the occurrence of some reaction in A{A); 
the mark gives the resultant jumps in the elements of A and takes its 
value in the mark space := {S^\m G A(j4)}. This results in the following 
definition of an SKM subprocess. 

Definition 2.2. Let [N,S, P] be an SKM and for AcV, let A{A) be 
the nonempty, finite subset {m € A4:S^ ^ 0}. Denote by M.{A{A)) the 
partition of A (A) obtained by grouping reactions that change A identically, 
that is by applying to A(^) the equivalence relation 

m m' <^ S:^ = S^i . 

Denote the eth element of A4(A(^)) by 7We(A(A)), e = 1, . . . , \M{l^{A))\. 
The subprocess of the SKM, N^{t), is the |A^(A(A))|-variate counting pro- 
cess given by 

(2.7) N^{t):=\ Yl ^rn{t)] 

^meMeiAiA)) J e=l,...,\M{A{A))\ 

The internal history of N^(t) is denoted J^/^. 

Note that since A4{A{A)) is a partition of A (A), the components of N^{t) 
have no jumps in common. Each element of the MVCP N^{t) thus counts 
the number of times reactions in A(^) have occurred that result in a given 
change in A. Intuitively, putting these elements together for all possible 
types of change in A to form a sample path of N^{t) captures exactly the 
"information" given by the corresponding sample path of X^{t). Indeed, 
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there is a bijection between the sample paths of N'^{t) and those of X^{t). 
The following technical lemma establishes that the internal history of the 
MVCP iV^(t) is identical to that of 

Lemma 2.1. For A C V, let N^{t) be a subprocess of an SKM as in 
Definition 2.2 and let J-^^ := a{X^{s); s <t) be the internal history of the 
jump process X^{t). Then J-^ = J-f Vt > 0. Furthermore, if B, D] is a 
partition of V then = J"/^ V V jf' = , Vt > 0. 

Proof. A proof that J^f = is given in Appendix B. For Tf = J^^ , 
take A = V. Finally, = ^ V V since X{t) = 
X^it)']'. □ 

One advantage of a counting process definition of the subprocess for the 
species in ^4 C V is that one may speak of the J^/^-intensity for the sub- 
process and interpret this in the usual manner as determining the local or 
instantaneous dependence of the subprocess on the full internal history of 
the SKM, . 

Proposition 2.2. For AQV, let N'^{t) be a subprocess of an SKM as 
in Definition 2.2. The -conditional intensity under P is given by 

(2.8) \^{t):={ 

^mdMeiHA)) ^ e=l,...,l>)(A(A))| 

Proof. Immediate from (2.7) on noting that the intensities of the su- 
perpositions of the counting processes are the sums of the corresponding 
intensities. □ 

Notice that each element of the intensity, A^(t), is the sum of the intensities 
(or stochastic rates) of all those reactions that result in the corresponding 
change in A. 

It follows from the equations in (2.5) that, for any A C V, the probability 
conditional on that, during (t,t -|- h\, there is no change in X^ is equal 

to 1 — /iX^i'^i^^'^^^' A^(t) +o{h). Similarly, the probability conditional on 
that, during {t,t + h], there is exactly one jump in X^ equal to S^^, 
for some m E Ade{^{A)), and also that no other reaction m' € 7W occurs 
is equal to hX^{t) + o{h) for e = 1, . . . , |A^(A(^))|. Summing over all of 
the foregoing, mutually exclusive events shows that these have conditional 
probability equal to 1 -|- o{h). Thus, in this infinitesimal sense, the 7f - 
intensity X^{t) may be interpreted as determining the local dependence of 
N^{t) on 7f . 
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3. Kinetic independence graphs. The identification of subprocesses of 
the SKM with their corresponding MVCPs (see Section 2.3) greatly facil- 
itates the construction of a kinetic independence graph encoding the local 
independence structure of the SKM — see Definition 3.1 below. The use of 
the local independence concept in constructing graphical models for contin- 
uous time processes owes much to Didelez [5, 6]. However, SKMs require 
new methods since interest is in dynamic independences between groups 
of species rather than the reaction counting processes [Nm{t)]m£M per se. 
Thus, the vertex set of the graph will be V rather than Ai. 

It is worth noting that existing graphical models for continuous-time 
Markov chains [5, 22] are not applicable to SKMs because the Markov pro- 
cess X{t) neither has finite state space, nor is it composable for most SKMs 
of interest. Roughly speaking, composability [5] implies that any change of 
state in X{t) can be represented as a change in only one of several com- 
ponents. Consider the use of X'^{t) and as components [since if 
X{t) is composable with more than 2 subsets of species as components, it 
must be composable with just 2 components] — either the paths of X^{t) and 
have common jump times contradicting that X{t) is composable, 
or they constitute 2 separate SKMs which then require a new method for 
their individual analysis. 

The kinetic independence graph of an SKM is defined as follows. 

Definition 3.1. The directed graph G with vertex set V is the kinetic 
independence graph (KIG) of the SKM [N, S, P] if and only if 

(3.1) pa{k) = R[A{k)]\{k} VfceV, 

where pa(A;) = {z € V\i k} is the set of parents of vertex k, and R[A{k)] := 
UmGA{fc) is the set of reactants of all reactions that change species k. 

Since only partial information about the SKM is required for construc- 
tion of the KIG, the necessary information is currently available for many 
biochemical reaction networks. For each m € A^, it is required to know the 
reactants R[m], and the species (reactants and products) changed by the re- 
action, that is, {i € V\Sim 0}. Full knowledge of the stoichiometric matrix 
S is neither necessary nor sufficient for construction of the KIG. Note that 
the possible presence of a catalyst among the reactants R[m] implies that 
R[m] cannot be reliably reconstructed from S. No knowledge of the rate 
parameters Cm is required for construction of the KIG, which is important 
since their measurement is difficult experimentally. 

A comment will be useful at this juncture on the treatment of measura- 
bility considerations in the paper. While the treatment is fully rigorous, it 
is appreciated that some readers will be more concerned with application 
of the paper's results. Proofs requiring a measure-theoretic approach have 



12 



C. G. BOWSHER 



therefore been placed in Appendices A and B. Note that a statement such 
as the one that Xmit) is (as it must be) measurable J-^ implies that the 
realized value of the r.v. Xmit) may be "computed" from the sample path 
of the subprocess for R[m] over the interval [0,t]. 

The motivation for Definition 3.1 of the KIG of an SKM is that the local 
evolution of species k depends only on the stochastic rate of reactions that 
change the number of molecules (the level) of k, which in turn depend only 
on the levels of their reactants. To make this exact, the concept of local 
independence [6] is needed. Let A,BcV. We will say that A^'^ is locally 
independent of A^"^ (given N^\^) if and only if the J^^^-intensity, X^{t), is 

measurable J-^"^^ for all t — that is, the internal history of is irrelevant 
for the J^^-intensity of the species in B. Only intensities of subprocesses 
conditional on the history of the whole system, J-J^ , are considered here (as 
opposed to -intensities where Qt C J^^)- 

As a consequence of Definition 3.1, one can read off from the KIG, for 
any collection of vertices B, those subprocesses with respect to which A^^ 
is locally independent, that is, which are irrelevant for the instantaneous 
evolution of B. Denote the closure of B by cl(-B) := pa(-B) U B. 

Proposition 3.1. Let G he the KIG of the SKM [N, S, P] and let A, Be 



IS 



V. Then the -intensity X^{t) is measurable for all t, that is, 

locally independent of N^^^^^^'^ (given A^'^'(^) J. Suppose that Ancl{B) = 0. 

Then X^{t) is measurable J-^"^^ . 

Proof. By (2.8), each intensity X^{t) is measurable J^^^^^^'^^ because 
Xm{t) = Cm9m{X^^"'\t-)} is measurable ^'^f^^' = Vm G A{B), 

recalling that R[A{B)] = \JmeA{B) ^H- Since pa(S) = {[jkeBMk)}\{B} = 
{[j^^gR[A{k)]}\ {B} = {R[A{B)]}\{B}, it follows that R[A{B)] C d{B) , 
and hence J^^^^^^^^ c J^f^^^ by Lemma 2.1. Thus, each intensity X^{t) is 

measurable J-f^^\ and the remainder of the proposition follows immedi- 
ately. □ 

Proposition 3.1 accords with chemical intuition. Given the internal his- 
tory of X^^^^^ at time t, the levels of the species R[A{B)] just prior to t are 
"known." These are exactly the species levels that determine the local dy- 
namics of B since, as reactants, they determine the rate of all reactions that 
change the concentrations of B. Therefore, any further information about 
species histories, including the internal history of A^^\'^^(-^), is irrelevant for 
the local dynamics of B. 

Notice that loops, that is edges of the type k ^ k are by definition not 
included in the KIG, even though k may well be in R[A{k)]. For this rea- 
son, one cannot assert in Proposition 3.1 that X^{t) is measurable 
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but rather that it is measurable J-^^^^\ More generally, a particular SKM 
may imply further local independences of A^(t) than those encoded by the 
KIG — for example, due to a deterministic relationship between two subsets 
of species arising from a chemical conservation relation — but this level of 
knowledge about the SKM is not assumed in constructing the KIG. 

Graphical separations in the undirected version of the KIG, written C", 
are central in what follows. Diagrammatically, C is the undirected graph 
obtained from G by substituting lines for arrows. Let A,B,D C V. The 
notation A B\D stands for the graphical separation of A from B by 
D, that is, the property that every sequence of edges (or path) in G~ that 
begins with some a & A and (without any repetition of vertices) ends with 
some b € B, includes a vertex in D. With a partition of V, such 

a separation in is equivalent to the nonexistence of (a S A, 6 E B) such 
that there is an edge a ^ 6 or an edge 6 ^ a in G. This graphical separation 
implies the following mutual local independence property. 

Proposition 3.2. LetG be the KIG of an SKM [N,S,P], and let [A, B , D] 
be a partition of V . If A J-g~ B\D, then is locally independent of 
(given N^^^ ) and is locally independent of (given N^^^ ) or, equiv- 
alently, \^{t) is measurable Tf^^ and \^{t) is measurable F^^^ . 

Proof. A^g~ B\D, if and only if Sncl(^) = ^ncl(-B) = 0. The result 
then follows directly from Proposition 3.1. □ 

Note that it follows from the definition of the KIG that the graphical 
separation in Proposition 3.2 is equivalent to the chemical property A n 
R[A{B)] = B n R[A{A)] = 0. That is, A does not participate as a reactant 
in any reaction that changes B, and vice versa. Therefore, for example, 
i?[A(i?)] C BUD — hence, given the levels of B and D (which fully determine 
the rate of reactions that change B), the levels of A are irrelevant for the 
instantaneous evolution of B, and vice versa. Section 4 will establish that 
under weak regularity conditions on the SKM, the separation A -Lg~ B\D in 
G'" implies not only mutual local independence but also global conditional 
independence of the internal histories of A and B given a history of D. 

As an illustration of the concepts discussed so far, consider again the 
SKM of Example 2.1. The corresponding KIG is shown in Figure 1. Note the 
presence of cycles in the KIG, including g—^R^P^P2—^g which might 
be termed the "negative feedback cycle." Clearly, {P,R}±g~ {ff-fblllS; -fb}- 
Let D := {g, P2}. Notice that, according to Definition 2.2, SKM subprocesses 
are given by N^^^ = [Nk,N^b]' and = [N^, N^d, N^, N^b]' ■ Hence, T^' C 
TP , and the global independence /'/''^ _IL J"/^^ | J^/^ holds immediately in 
this case. 
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Fig. 1. Kinetic independence graph of the SKM in Example 2.1. 

Anticipating the problem of how to modularize SKMs to be tackled in 
Section 5, a modularization suggested for the SKM of Example 2.1 by 
its dynamic independence properties is the modularization [{P, i?, (7,^2}) 
{gP2,g, P2}]- The 2 module "residuals" are given by {P, R} and {gP2}- Each 
module residual is locally independent of the other given that module's inter- 
nal history. Furthermore, the 2 modules are conditionally independent given 
the history of their intersection, Tf'^^ . In fact, these 2 modules correspond 
to the maximal prime subgraphs of for this example (see Definition 5.2). 
The graphical methods for identifying SKM modularizations in Section 5 
are, broadly speaking, also based around the maximal prime decomposition 
of the undirected KIG. 

4. Global dynamic independence. This section will present the theorems 
establishing that for a partition \A^B^D\ of V, the separation of A from 
i3 by Z) in the undirected version of the KIG implies the global dynamic 
independence J-"/^ _IL for all t > 0, under the probability measure of 

the SKM, P. 

The history of given by F^* is defined formally below. Heuristically, 
TP includes at time t the internal history of the jump process for the species 
in D C ), and also for every jump time of always contains the 
"information" whether some species in A jumped, or some species in B 
jumped, or some species in both A and B jumped, but not (necessarily) 
the particular species involved. The main proofs of the theorems are quite 
involved and are given in Appendix A. Readers less concerned with technical 
details will find an outline of the argument and intuitions for important 
aspects of the proofs in this section of the paper. It is worth defining here 
explicitly what is meant by the conditional independence of cr- fields [4, 7]. 

Definition 4.1. Let (r2,J-",P) be an arbitrary probability space and 
suppose we have 3 sub-ci-fields C T. We say that and T^are 
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independent conditionally on J-"^ and write _IL P if and only if 

E[Zi|^2 V J"^] = E[Zi|J^3] 

for all nonnegative random variables Z\ that are measurable J-^ . The no- 
tation J-"^ V stands for the smallest u-field containing both T"^ and . 
The relationship is symmetric, that is, JLJ"^! J''^; P <J4> J"^ JLJ'-'^j J"^; P. 

Thus, the global dynamic independence statement _IL Tf\J-^ can be 
understood as follows: the expectation of (suitably measurable) mappings 
from sample paths of A^"^ (resp., A^^) over (0,t] to M, conditional on the 
history , are unchanged when the conditioning a-field also includes the 
internal history of A^'^ (resp., A^^). Roughly speaking, and over any time 
interval (0,t], all "information" about the dynamic evolution of B is irrele- 
vant for the dynamic evolution of A, given the "information" in (and 
vice versa). 

First, an outline of the logic of the argument of this section is presented, 
before going on to state the main theorems. 

4.1. Preliminaries and outline of argument. The following lemma is cen- 
tral to the method used. Although closely related to a result in [7], I am not 
aware of its statement and proof elsewhere. 

Lemma 4.1. Let P, P be probability measures on an arbitrary measurable 
space, (O, J-"), such that P ^ P. Consider any 3 sub-a-fields J-^,J-^,J-^ C 
satisfying the conditional independence J-^ _IL P under the dominating 

measure P. Denote by C123 a Radon-Nikodym derivative, (dPt/dPj)jjrivjr2vj-3 

Then the following condition implies that the conditional independence 

ALT'^\T^ holds also under P: 

J0i23 = V'13V'23, 

where V'ia is a nonnegative, J-"' \/ J^"^ -measurable random variable for i € 
{1,2}. 

Proof of Lemma 4.1 is given in Appendix B. 

Since J-^ is a history of A'^^ (i-e., J-^ C J-^'), it follows from Lemma 2.1 
that V 7f V FP' = F^ = . k likelihood process Ct := (dPt/dPt)|^iv 
is thus required in order to apply Lemma 4.1 to the 3 cr-fields Ff^,Ff ,FP* . 
Given its importance here, we restate for an SKM the following likelihood 
result from the counting process literature (see, e.g., [16], Theorem 4.1, page 
74). The proof is omitted since it is well known. 
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Lemma 4.2. Let [N, S, P] be an SKM as in Definition 2.1, and let [iV, P] 
he the M -variate Poisson process with intensities Ijv/ = (1, • ■ • , !)'• Then, for 
every t>0, Pt CLnd a Radon-Nikodym derivative is given by 



(4.1) 

dPt 



■'t 



n| n A„(XT™_)|exp|t- / X^{Xu- 



\ du 



Note that the counting processes [Nm] m= 1, . . . , M] are independent under P 



(see, e.g., [17], Proposition 4.1.2), and hence the a-fields [Tl"; 
are independent under P and Pf for all t>0. 



m ■■ 



1,...,M] 



Of considerable importance here will be the fact that, under the domi- 
nating measure P, the counting processes [Nm] m = 1, . . . , M] are indepen- 
dent. Of course, two or more of the subprocesses [A^"^, A^'^, A^'^] may have 
jump times in common as the result of reactions that simultaneously change 
several of the species sets [A,B,D]. However, denoting by ADd the re- 
actions that change D alone, the reaction set A4 can be partitioned as 
[A{A) , A{B) \A{A),ADd]. The independence of the reaction counting pro- 
cesses then implies that F^^^'^ jj_ jr^(-S)\^(^) j^rADu. -which is a point of 
departure for proving Theorem 4.4 below. 

To apply Lemma 4.1, we first establish that if A -Lg~ B\D, then _IL 
T^\J-P under the dominating measure P (see Theorem 4.4). We then show 
that the factorisation Ct = ipAD*,t'4^BD*,t holds with, for example, il^AD*,t 
an J-^ V J^/'* -measurable r.v. (see Theorem 4.5). We are thus able to con- 
clude that if A ±G~ B\D, then the SKM must satisfy J"/^ _IL Jf | J"/^*; P (see 
Corollary 4.6). 

Definition of the filtration {J'P*} is needed. This is best understood as 
the internal history of a particular MVCP, N^*[t), defined now below. 



Definition 4.2. Let [A,B,D] be a partition of V, the species set of 
an SKM. Define ADa ■= {A{D) n A{A)} \ A{B), the set of reactions that 
change D and A, but not B. Similarly, define ADab ■= A{D) n A{A) n A{B) , 
ADb := {A{D)nA{B)}\A{A), and those that change D alone by ADd := 
A{D) \ {A{A) U A{B)}. Then [ADa, ADab, ADb, ADd] is a partition of 
A{D), the reactions that change D. 

The MVCP N^* (t) is constructed by taking each element, AD,, of this 
partition in turn and summing over counting processes for reactions in AD, 
alone that result in identical changes to D — that is, by applying to each 
AD, the equivalence relation 

mr^ m' 4^ = S^, , m, m' G AD,. 
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The resultant MVCP, denoted iV. (t), is given by 

^meMeiAD.) J e=l,...,|^(AD.)l 

which differs from the SKM subprocess N^{t) of Definition 2.2 in that the 
partition A4{AD,) is used in place of M{A{D)). Then define 

N^\t) := [NS{t),NUt),NE{t),NS{t)]. 

Here AD, is empty, the relevant component of (t) is set equal to zero 
Vt > 0. The corresponding internal histories of N^{t) are written {T^{t), 

Thus, as stated previously, at time t Tf* includes the internal history of 
the jump process for the species in D, J-^ , and also for every jump time of 
always contains the "information" whether some species in A jumped, 
or some species in B jumped, or some species in both A and B jumped, but 
not (necessarily) the particular species involved. An alternative formulation 
of J-P* would be as the internal history at t of the marked point process 
{r^,Z^}, where the marks give not only the value of the jump in D 
but also an indicator of which element, AD,, the reaction causing the jump 
in D belongs to. 

In some applications, it may be more convenient or practical to use only 
internal histories. Section 4.3 will thus provide a rigorous and intuitive means 
of comparing N^\t) and iV-^(t)— through comparison of the corresponding 
partitions of the reactions in A{D) — and state a property that is easily 
checked for a given SKM under which J^P = . 

The following conditions on the SKM are used in the statement of the 
results of this section. Both Theorems 4.4 and 4.5 assume that the SKM is 
standard, which imposes the following very weak regularity conditions. 

Definition 4.3. An SKM [iV,5, P] is a standard SKM if it satisfies 
all of the following: (i) every reaction changes at least 1 species, that is, 
Sm 7^ Vm e {1, . . . , M}; (ii) every species in V is changed by at least one 
reaction, that is, the row S'fc, 7^ V/c G V; (iii) if a zeroth order reaction m 
is included (i.e., R[m] = 0) then it has only 1 product; (iv) for all m, if 
\R[m]\ = 1 then \R*[m]\ = 1 and if \R[m]\ > 1 then \{R[m]} \ {R*[m]}\ < 1, 
where i?*[m] = {i £ R[m]\Sim 7^ 0} are the reactants changed by m. 

The first condition of Definition 4.3 is obvious. The second does not pre- 
clude an eff'ect of the concentration of species that are constant over time 
(via the functions gm or the constants Cm)- The third is just a convention. 
The fourth ensures that if R[m] 7^ then the reaction has at least 1 reactant 
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that is changed and at most 1 reactant that is not changed. It allows for 
the inclusion of a reaction with an unchanged reactant where this simplifies 
the SKM, for example, where that reactant acts as a catalyst [although the 
reaction could be broken down into several reactions not requiring condition 
(iv) if desired] . 

Theorems 4.4 and 4.5 also require that the following condition holds for 
r = A(^) n A{B). If A(^) n A(_B) = 0, as sometimes happens, then the 
condition is trivial and always satisfied. 

Condition 4.3. Let [A^, S, P] be a standard SKM. A subset of reactions 
r, C r C A4, is said to be identified by consumption of reactants if and 
only if: 

(i) For all m € F, Sim < Vi S R[m] (hence, Skm < for some k € R[m] 
provided that R[m\ / 0), and Sim > Vi € P[m]; and (ii) $m,rfi G F (m 7^ 
m) such that 5~ = S^ , where 5" denotes the vector formed by setting all 
positive elements of Sm to zero. 

Remark 4.1. Condition 4.3 implies that no 2 reactions in F change 
reactants identically, hence the reactions in F are identified uniquely by 
their consumption of reactants. Condition 4.3 will be satisfied with F = 
by most SKMs of interest, possibly after explicit inclusion of enzymes in 
reaction mechanisms. Although autocatalytic reactions such as Xj +Xk -^"^ 
2Xk and its reverse violate condition (i), these could be accommodated by 
instead including a more detailed mechanism, for example, Xj + X/^ 
XjXk and XjXf, -^""^ 2Xk. 

An alternative approach would be to work with Gj, the graph obtained 
from the undirected version of the KIG by adding an edge j ~ A; when- 
ever [Sjm. > and Skm > for some m G M] and j 00 k in . The graph 
GJ might be termed the fraternized (as distinct from moralized) version of 
the KIG. The separation A _Lg~ B\D implies that A(A) n A(S) = [since 
A B\D and hence R[rn\ C D for any m € A(A) fl A(i?)]. Therefore, if 
A _Lg~ B\D is replaced by A Lq- B\D, Condition 4.3 for F = A(^) n A{B) 
can be dropped from the statements of Theorems 4.4 and 4.5, and from that 
of Corollary 4.6. 

4.2. Global independence theorems. We are now in a position to state the 
main results of Section 4 of the paper. Theorem 4.4 is concerned with global 
dynamic independence under P, the law of the M-variate Poisson process 
(see Lemma 4.2). 

Theorem 4.4. Let G be the KIG of a standard SKM, [A^,5, P], and 
let [A,B,D] be a partition of V . Suppose also that Condition 4-3 holds for 
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r = A{A) r)A{B) (where T is possibly empty, in which case the condition is 
trivial). Then A J-g~ B\D implies that }LTf\T^*\Pt, where \7't^*\ is 
the natural filtration of N^*{t), N^* (t) is given by Definition and P is 
the law of the M-variate Poisson process in Lemma 



We provide here a somewhat heuristic discussion of this result, a rigorous 
treatment being given in Appendix A.l. The argument can be broken down 
into four steps. 

First, the reaction counting processes [A^^; m = 1, . . . , M] are independent 
under P. Therefore, 

(4.2) J-f(^) XT-f (^)\^(^)|7-t''''";Pt, 

since [A(A), A(i?) \ A(^), A.Dd\ is a partition of the reaction set TM. Equa- 
tion (4.2) holds because the three MVCPs associated with each element of 
the partition are (unconditionally) independent. 

Second, consider again Definition 4.2 for N^* {t) = [{N^{t),N^^{t)}, 
{Nj^{t)},{Ng{t)}]. The internal history of the first component MVCP in 
curly parentheses must be contained in the internal history of N^^^\t). [All 
the reactions involved in that component change A and hence the sample 
path of N^^^\t) implies that of the first component.] Similarly, the internal 
history of the second component of [t] in curly parentheses must be con- 
tained in that N^^^^\^^^\t) . The internal history of the third component 
is equal to J't^^'^ ■ Combining the internal histories of these 3 components 
making up (t) must give J-P . Therefore, the internal histories of the 
first 2 components can be used to expand the conditioning information in 

(4.2) to give 

(4.3) 7-f(^)xj-f(^)\^(^)|J-r;Pt. 

Third, establishing the property J^^^"^^ _1L T^^^^ \J-f ; Pt implies the global 
dynamic independence in Theorem 4.4 since the internal history of the sub- 
process for A must be contained in that of N^^^^ [since the sample path of 
N^^^\t) obviously implies that of N^(t)], and similarly for B. This property 
in turn follows by showing that the internal history of N'^^^^'~^^^^\t) is con- 
tained in Tf* . The second c-field in (4.3) can then be expanded to include 
j-A{A)nA{B) ^ Combining the internal histories ^nd 
m this way gives J-f . 

Finally, jr^(^)^^(^) c J^P is a direct consequence of the fact that ADab = 
A(^) n A{B) — that is, all reactions that change A and B also change D — 
and that the reactions in AD^b change D uniquely (among themselves). 
These properties of AD^b depend crucially on the graphical separation 
A J-G~ B\D and also on Condition 4.3 holding for ADab (under the condi- 
tions of Theorem 4.4). The separation ensures that for any m € A(^)n A(S), 
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the reactants of m are all in D (otherwise, we would have either B ox 
B ^ A va the KIG) and hence also m € A(D). Condition 4.3 ensures that 
the members of /S.Dab are identified by consumption of reactants, hence 
the reactions in /S.Dab niust change D uniquely (among themselves) and 
^A(A)nA{B) =j-i>^(i). Therefore, jf(^)"^(^) CT-/^*, since iVi^B(t) isacom- 
ponent of N^'it). 

We now turn to consider global dynamic independence under P, the law 
of the SKM. 

Theorem 4.5. Let G be the KIG of a standard SKM, [N,S,P], and 
let [A,B,D] be a partition ofV. Suppose also that Gondition 4-3 holds for 
r = A{A) nA{B) (where T is possibly empty, in which case the condition is 
trivial). Then A J-g- B\D implies that 

Ct := {dPt/dPt)\^N = iJAD'^t ■ i^BD*,t, t > 0, 

where ipiD*,t is a nonnegative, Tl \J -measurable random variable for 
i G {A,B}, and {J^P } is the natural filtration of N^*{t). 

Taking the logarithm of the likelihood in (4.1) yields 

iogCt= [i- rA™(tx)dn + ^i(rr <t)iog(A^(rr)) 

(4.4) 

Theorem 4.5 may be established by showing that, Vm G M, lm{i) is mea- 
surable either JT/^^* := -F/^ V F^* or jf ^* := J^f ^ T^* . We explain here 
how Imit) niay be computed (Vm € Ai) using either just the sample paths 
of N^{u) and N^*{u), or just the sample paths of N^{u) and N^* (u). It 
is clear from (4.4) that Imit) niay be computed when Xm{u) niay be com- 
puted for all u S (0,t] and the sample path of the counting process for that 
reaction, Nm{u), may be computed over the same time interval (so that the 
jump times {TJ" < t} are known). There are two main elements involved in 
the argument. 

First, the graphical separation A -Lg'~ B\D again has an important impli- 
cation for reactants: for any reaction m, either R[m] <^ AU D or R[m] C 
B U D. Recalling (2.6), only the sample path of the subprocess for the 
reactants R[m] is needed to compute Am (it), hence the sample paths of 
the subprocesses for either or [-B,-D] suffice, according to whether 

R[m] C AU D or R[m] B U D. [The sample path of D can clearly be 
computed from that of A^^*(u).] 
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Second, the sample path {Nm{u)]u < t) may be computed using just 
the sample paths of [N^,N^*] or [N^,N^*], again according to whether 
R[m] C Add or R[m] C BUD. To see this, consider each group of reactions 
in the partition of M given by [ADd, ADab, ^{A) \ A{B),A{B) \ A{A)], 
beginning with m € ADjj. By definition the path of (n), specifically 
of its subcomponent N£{u), allows identification of the jump times corre- 
sponding to all reactions in AD^i that change D identically to m. But since 
such reactions in ADd change D alone, they must do so uniquely (among 
reactions in AD^) since no 2 columns of S are equal (Definition 2.1). There- 
fore, the path of N^' (u) suffices in this case to compute {Nm{u);u < t). 

The argument for other groups in the partition is similar. For m G ADab, 
it has already been noted that the reactions in ADab change D uniquely 
(among themselves). The argument for the last 2 groups is essentially the 
same. The third group is further partitioned as [AD^, A*(74)], where A*(^) 
are the reactions that change A alone. Consider m G AD a — again, by def- 
inition, the path of N^* (u) [specifically, of (u)] allows identification of 
the jump times corresponding to the subset of reactions in AD a that change 
D identically to m. This subset may now contain more than 1 reaction, but 
inspection of the value of the jumps in the sample path of the subprocess 
for [A U D] corresponding to the jump times so identified allows one to "iso- 
late" just those caused by reaction m (since, again, reactions in AD a change 
AuD uniquely among themselves). The argument for m G A*(^) is similar, 
after noting that the jump times of all reactions in A*(^) can be identified 
by eliminating all those of and of N^^. 

The preceding two theorems allow the use of Lemma 4.1 to obtain the 
following corollary, which summarizes the main results of Section 4. 

Corollary 4.6. Let G be the KIG of a standard SKM, [TV, 5, P], and 
let [A^B,D] he a partition of V . Suppose also that Gondition 4-3 holds for 
r = A{A) n A(B) (where T is possibly empty). Then the separation A -Lg~ 
B\D in the undirected KIG implies that the global conditional independence 
J-^ ^ Tf\T^* \ Vt holds \ft > 0, where {J~P*} is the natural filtration of 
N^*{t). 

Proof. Apply Lemma 4.1 to the 3 cj-fields Tf^,Tf,TP* C , recall- 
ing from Lemma 4.2 that Pt ^ Pt- Since A -Lg~ B\D, Theorem 4.4 im- 
plies that J't^ AL Jf IJ"/^*; Pt. Now V 7f V * = 7f , whence (dPj/ 
dPt)|jrAyj-SyjrD* = A , wMch Is glveu by (4.1). Again since A ±g~ B\D, 
Theorem 4.5 implies that Ct = tpAD*.t ■ '^BD*,t, where ipiD'.t is a nonnega- 
tive, J-'l V J-"/^* -measurable random variable for i G {A,B}. Lemma 4.1 then 
implies that T/^ _IL ; Pt, as required. □ 

Under the conditions of Corollary 4.6, the separation A -Lg~ B\D does not 
imply in general that J-^ _1L J^P\TP; Pt, where the conditioning is now on 
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Fjr' rather than . Similarly, the separation in the moral graph, A J-g™ 
B\D, does not imply that J-"/^ JL Pj . The following theorem and 

proof establishes both points. The procedure for constructing G"* is the 
usual one — edges are inserted in the KIG whenever 2 parent nodes of a 
common child are "unmarried" (i.e., have no edge between them) and then 
the undirected version of the resulting graph is formed. 

Theorem 4.7. Let G he the KIG of a standard SKM, [iV,^, P], and 
let [A,B,D] be a partition of V . Suppose also that Gondition 4-3 holds for 
r = A(yl) n A{B) and A _Lg~ B\D. Then it is possible that neither 7^ _IL 
Pf nor JL Tf\J-P\ P( holds, where is as usual the internal 

history of the subprocess (t) . 

Proof. The proof is by example. Consider the standard SKM with 
V = {A, B, D} and reactions 

A^f D, D^^A, D^''"'B, 

which has the KIG, G = A < — > D^B. Note that = G"". Clearly, T = 
and^±G~ ^l-D. Note also thatiV^(t) = [Nf{t),Nr{t)]',N^{t) = [Nf {t),Nr{t) + 
Nirr{t)]' and X^{t) - X^{0) = Nirr{t). It suffices to show that, under both 
Pt and Pj, E[X^{t) - X^{0)\TP] is not a version of E[X^{t) - X^(0)|7;^ V 
J"/^]. First, show that J'/' V = . Clearly, J^^ V C J-f , and since 
Nirr{s) = [Nr{s) + Nirr{s)] — iVr(s), Nirr{s) is measurable J-'f^ V , hence 
TP C F^yTp. It follows that, under both P* and Pt, E[X^{t)-X^{Q)\F^\J 
Tf] = Nirr{t) since Nirr{t) is measurable J-f- V J-^. However, Nirr{t) is 
clearly not measurable J'P and so cannot be a version of E[X^ (t) — X^ {0)\J'P] 
under either probability measure. In fact, it is possible to show that under 

Pt, E[X^{t)-X^{0)\TP] = l[Nr{t) + mrr{t)]. □ 

4.3. Histories of the separator, D. It is of interest in applications to 
understand, for a given partition [A,B,D] of V, how the histories {J'P*} 
and {^P} differ. A comparison of N^*(t) and N^{t) is equivalent to a 
comparison of the corresponding partitions of the reactions A{D). 

Proposition 4.8. The partition given by M*{A{D)) := {M{ADa) U 
M{ADab)^M{ADb)UM{ADd)} is a refinement of the partition M{A{D)) , 
so that every element of M{A{D)) is a union of elements of M*{A{D)) 
(see Definition 2.2 for the partition notation used). Hence, TP HTP and 
V FP V fP* = fP Vt. 

Proof. Take an element oi M{A{D)),Me{A{D)) say. Let m G Me{A{D)) 
and denote the element of Ai*{A{D)) to which m belongs as Ai'*^{A{D)). 
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Now M*^{A{D)) C Me{A{D)) since all elements of M%,iA{D)) change D 
equivalently (resulting in the same change to D as m does). Thus, Aie{A{D)) = 
UmeA^e(A(D)) ■^m(^(-^))) which establishes the first claim. It then follows 
from Definition 2.2 that Tj^* 2 J'P since elements of N^{t) are obtained 
by summing (where necessary) the appropriate elements of (t). Lemma 
2.1 established that J"/ V J^f V = . But C tJ^' then imphes 

In computational work with SKMs, establishing if the partitions A4*(A(D)) 
and M{A.{D)) are identical provides a straightforward means of checking 
whether the processes (t) and N^[t) are identical. The two partitions 
are identical if and only if there do not exist two reactions in different ele- 
ments of [ADyi, ADa_b, ADb, ADu] that result in the same change in D — 
that is, there do not exist 2 reactions in A{D) that change D identically but 
do not have the same membership of both of the sets [A{A), A{B)]. 

Proposition 4.9. Let [A,B,D] be a partition ofV, the species set of 
an SKM. Then N^* (t) = N^{t) Vt,Vc<; € il, if and only if the following 
condition holds: for any 2 reactions m, m € A{D) with = S^, the reaction 
m has the same membership of the two sets [A(A),A[B)] as does the reaction 
m. 

Under this condition, {J^P*} = {J^P}- 

Proof. If the condition holds both m and m are members of an equiv- 
alence class of some AD, Hence, any 2 members of an equivalence class of 
A{D) — that is, of an element of A^(A(L')) — are also both members of an ele- 
ment ofM*{A{D)). Therefore, by Proposition 4.8, M{A{D)) = M* {A{D)), 
whence N^* (t) = N^{t) Vi, Vw. 

Conversely, suppose A'^^ (t) =N^{t) yt,\/uj. Then the vectors N^*{t) = 
N^{t) have the same dimension and so A4* {A{D)) cannot be a strict re- 
finement ofM{A{D)). Hence, by Proposition 4.8, M{A{D)) = M* {A{D)). 
Suppose the reactions {m,rh) differ in their membership of the two sets 
A(^),A(i?). Then {m,m) are in different elements of M*{A{D)) but the 
same element of A4{A{D)), which is a contradiction. □ 

In applications where it is more convenient or practical to include only 
internal histories, checking the condition of Proposition 4.9 — or equivalently, 
the equality of M{A{D)) and M*{A{D)) — often reveals that the processes 
(t) and N^(t) are similar or identical. This is in part because, in prac- 
tice, many elements of A4{A{D)) are single reactions — that is, many of the 
reactions that change D are uniquely identified by the corresponding change 
in D. Furthermore, where J-^ C J-P* (strictly), the partition [A,B,D] can 
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often be altered slightly to make the processes (t) and N°{t) identical. 
Examples of this are given in Section 6 in connection with the red blood cell 
SKM. 

5. Independence and modularity. Rigorous mathematical definition and 
identification of modularizations for biochemical reaction networks is recog- 
nized as being a difficult problem, especially from a dynamic perspective 
[29]. A prominent approach has been to construct a graph representing "in- 
teractions" between species and to consider different partitions of the species 
between modules, maximizing an objective function based on the fraction of 
edges that are intra-modular relative to the expected fraction in an "equiva- 
lent," randomized graph when the same partition of species is used [13, 18]. 
From a stochastic process perspective, the graphs used often do not encode 
properly the dependence structure of the molecular network — for example, 
in contrast to a KIG, metabolic network graphs typically omit the local 
dependence between reactants in the same reaction, only capturing that be- 
tween reactant and product. The approach is intended to operationalize the 
concept that modules function "near-independently." However, the measure 
of modularity adopted for the objective function is rather distant from well- 
defined notions of dynamic (in)dependence between species. The local and 
global conditional independence results developed in Sections 3 and 4 make 
it possible to add content to and make rigorous what is meant by near- 
independence of modules, and to accommodate "overlapping" modules with 
nonempty intersection. 

The term modularization is derived from the biological literature where 
"modularity" has been much discussed. A modularization here is a hyper- 
graph of the vertex set of the KIG (i.e., a collection of subsets of species) 
with the following property — the internal history at time t of each subset 
(or module) is conditionally independent of the internal history of all the 
other modules, given the history of its intersection with those modules. 

Definition 5.1. Let V be the species set of an SKM [iV, S, P] . The finite 
collection of subsets of V, {Mrf|Mrf C V}, is a modularization of the SKM if 
and only if IJd ^^^d = V and 

(5.1) 7-f'*X7|^=^''^'^n-^f';P Vd,t, 

s* 

where Sd = H {|Jg_^^Me} and the history {T^ is the natural filtration 
of N^d{t). The latter is given as usual by Definition 4.2, applied to the 
partition [Ma \Sd,V\Md, Sd] ■ 

Note that since V \ Md = {UeT^d -^e} \ Sd, (5.1) is equivalent to the state- 
ment J^^^'''^^'' JL jr^\'^^<i I ; p vd, t. Roughly speaking, the global evolution 
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on [0, t] of the species in Ma \ Sd and the species in V \ Md ( "the rest of 

the network") are conditionally independent given the history of the in- 

s* 

tersection, J-"^ . We will say that two modularizations are nested if each 
module of one of the modularizations is contained in some module of the 
other modularization. 

Of course some modularizations of an SKM will be more useful than oth- 
ers. It will usually be desirable for the intersections Sd to contain a relatively 
small number of species and to be able to move between nested modulariza- 
tions, thus considering finer and coarser levels of resolution. Computationally 
efficient methods are developed below for the identification of such modu- 
larizations that are based around the maximal prime decomposition of the 
undirected version of the KIG of the SKM, C. It will be proved below that 
applying such graphical decomposition methods results in subgraphs whose 
vertex sets, {Ma} say, satisfy the graphical separation Md ±g~ [Je^d^^el^d 
yd. Therefore, under the conditions of Corollary 4.6, the required global 
dynamic independence of (5.1) holds for all d, and {M^} constitutes a mod- 
ularization according to Definition 5.1. 

5.1. Identifying modularizations by graph decomposition. Some defini- 
tions from the graphical literature will prove useful (for further details, see 
[21]). An undirected graph is said to be complete if there is an edge between 
all pairs of vertices in its vertex set. Let H be an undirected graph with 
vertex set V. The subgraph induced by Md C V, H{Md), consists of the ver- 
tices in Md and exactly the edges between those vertices that occur in H 
itself. A partition of V, A,B ^ 0, forms a decomposition of H into 

the subgraphs H{A U D) and H{B U D) if the separation A _Ljy B\D holds 
and the subgraph H{D) is complete. The subgraph H(Md) is prime if there 
does not exist a decomposition of H{Md). 

Definition 5.2. Let H be an undirected graph with vertex set V, and 
^ V. The induced subgraph H{Md) is a maximal prime subgraph of 
H if H{Md) is prime and there exists a decomposition of H{N) for all N 
satisfying Md C N CV. The maximal prime subgraph decomposition (MPD) 
of H is given by {H{Md)}, the unique collection of maximal prime subgraphs 
of H, and satisfies that \Jd-^d = V. 

A junction tree representation of the MPD, Tmpd; always exists and has 
the subsets {Md} as its clusters (i.e., as the vertices of the junction tree) 
[24]. A junction tree T is a connected, undirected graph without cycles 
in which the intersection of any 2 clusters of the tree, Md fl Mg {d^ e), is 
contained in every cluster on the unique path in T between Md and Mg. Such 
trees will prove very useful in visualizing, representing and manipulating 
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modularizations of SKMs. We say, for reasons that will become apparent, 
that any 2 clusters adjacent in the tree are separated by their intersection, 
and call that intersection a separator of T- 

The SKM modularization algorithm presented below contains as a special 
case the method due to [24] for computation of Tmpd, applied to the undi- 
rected version of the KIG, . The advantage of this version of Algorithm 
5.1 is that it can be fully automated to identify the MPD modularization of 
the SKM in a manner that is computationally feasible even for very large 
SKMs. However, it will often be informative to consider a range of nested 
modularizations in order to explore the different levels of organization of 
the reaction network. To this end, the general version of Algorithm 5.1 first 
obtains a junction tree of the clique decomposition for (a minimal tri- 
angulation of C) — this provides the finest, most detailed modularization 
that is identified. The clique decomposition of is unique (since it corre- 
sponds to the MPD of G^). Coarser-grained modularizations, including the 
MPD one, are obtained by successively aggregating adjacent clusters in the 
junction tree. 

Algorithm 5.1. Let G be the KIG of an SKM. 

1. Construct G~, the undirected version of G; 

2. Construct G^, a minimal triangulation of C; 

3. Obtain the clique decomposition of G^ with the cliques, {Ci, C2, . . . ,Cs} 
say, ordered to satisfy the running intersection property (i.e., for e = 

2,...,s,3d* G{i,...,e-i} s.t. Cen{U-=ia}^Q.); 

4. Organize the clique decomposition as a (rooted) junction tree Tc in which, 
for e = 2, ... ,5, the parent of Ce is Cd* ; set T = Tc', 

5. Either go to step 7 or, select a pair of adjacent clusters {Ci,Cj) in T 
(i < j) and aggregate them by updating T as follows: set P = pa(Ci) and 
G = {ch(Cj) U ch(Cj)} \ Cj, replace cluster i by Cj U Gj (retaining its 
numbering, i), set pa(Cj) = P, and set ch(Ci) = C; 

6. Go to step 5; 

7. Return Tmod = T. 

The property that G^ is triangulated is equivalent to saying that G^ can 
be decomposed recursively until all the resulting subgraphs are complete 
[21]. Such a recursive decomposition produces a collection of subgraphs con- 
taining the cliques {G^{Gd)}, that is, the maximally complete subgraphs of 
G'^. Triangulation refers to the operation of adding edges to G~ so that it 
becomes triangulated. The triangulation in step 2 must be minimal — 
that is, one for which removal of any edge added during triangulation results 
in an untriangulated graph — otherwise. Remark 5.1 below does not hold, in 
general. 
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Efficient algorithms have been developed in the graphical literature for 
both minimal triangulation and clique decomposition (see [3, 24]) which 
can be exploited here to compute the SKM modularizations and associated 
junction trees. The following special case of Algorithm 5.1 returns the junc- 
tion tree representation of the maximal prime decomposition (MPD) of the 
undirected KIG, [24]. 

Remark 5.1. Algorithm 5.1 returns Tmpd for the undirected KIG, C", 
when step 5 is replaced by: 

5'. While [there exists a separator S of T such that G^{S) is incomplete], 
aggregate within T the 2 clusters separated by S; then go to step 7. 

It is worth noting the time complexity of steps 2 and 4. The general 
problem of finding an optimal triangulation of an undirected graph (i.e., one 
that adds least edges among all triangulations) is NP-hard. The complexity 
of minimal triangulation (step 2) is 0{ne) where e is the number of edges in 
G~ , [24]. The complexity of constructing the clique junction tree Tc (steps 
3 and 4 combined) is 0{n^), [24]. 

5.2. Nested modularizations and junction trees. A concise proof that the 
clusters of the tree Tmod returned by Algorithm 5.1 constitute a modular- 
ization of the SKM — with any choice of aggregation scheme in stage 5 — is 
made possible by establishing that 7mod> like Tc*, is a junction tree, and 
that the intersections of adjacent clusters of Tmod continue to correspond 
to separators in Gy, and hence in The following proposition does just 
that. 

Proposition 5.2. Let Tmod the undirected graph returned by apply- 
ing Algorithm 5.1 to the KIG, G, of an SKM. Denote the clusters (modules) 
of Tmod by {M^}. Then Tmod is a junction tree. Suppose that {M^^Me) 
are any 2 adjacent clusters in Tmod with separator S^e '■= D Mg, and 
that (as is conventional) the edges Ma ~ Me are labeled by the corresponding 
separator Sde- 

Then Sde = ^de H V^d and the graphical separation V^e -L V^dlSde holds in 
G^, and hence in G'" , where Vde (Ved) is the union of the clusters in T^qd 
(TmoT))' "^OD ^'^^ subtrees obtained by cutting the edge Md ~ Mg 

in Tmod, and Md C Vde (Me^Ved). 

Proof of Proposition 5.2 is given in Appendix B. 

We can now state and prove the result that establishes the validity of our 
modularization identification methods. 
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Theorem 5.3. Let G he the KIG of a standard SKM, [N,S,P], and 
let Tmod be the junction tree of modules, {M^}, returned by Algorithm 5.1. 
Suppose also that Gondition 4-3 holds for = A{Md \ Sd) H A(V \ Md) 
\/d. Then {Md} is a modularization of the SKM in the sense of Definition 
5.1; and each Sd is given by Ueene(Md) '^de [where ne{Md) is the indices of 
those clusters that have edges with Md in Tmod ]■ Furthermore, each module 
residual Md \ Sd is locally independent ofV\ Md given the internal history 
ofMd. 

Proof. By Corollary 4.6, it suffices to show that the separation {Md \ 
Sd} ±Gj {V \ Md}\Sd holds in G^, for aU d, since then {Md \ Sd} ±g~ {V \ 
Md}\Sd holds in the undirected KIG G^ . This follows because every path 
in G^ from Md \ Sd V \ Md is also such a path in G'^ . Recall that by 
definition Sde ■ = Md fl Mg. Hence, 




(5.2) 

= U (MrfnMe)= U Sde, 

where the second line holds by the fact that Tmod is a junction tree (Propo- 
sition 5.2) since, for e ^ ne{Md), {Md H Me) is contained in Mg, and thus in 
Sde for some e € ne{Md) lying on the unique path between Md and Mg in 
Tmod- 

By Proposition 5.2, Md -Lg~ Ve^Sde Ve € ne{Md), since Md C Vde - Hence, 
Md J-G- Kd|{Ueene(Md) ^de} and, since this holds for all e G ne{Md), Md -Lg~ 

{Ueene(M,) ^ed}\{\JeGne{Ma) ^de}- NoW {Ueene(M,) ^ed} = {{Je^d^e}- To SCC 

this, note that the latter is the union of those clusters reachable by paths in 
Tmod that start with the edge Md ~ Mg for some e G ne{Md) (since Tmod 
is connected); and Ved is the union of those clusters reachable by paths in 
Tmod that start at the node Me (since T^qd connected). Therefore, using 
(5.2), Md ±G~ {Ue^d^e}\Sd, as required. □ 

6. SKM of red blood cell. We now apply the modularization techniques 
of Section 5 and the underlying dynamic independence theory on which they 
are based to identify biologically interesting modulariszations of an SKM of 
the human red blood cell. The study of this metabolic reaction network was 
an early success of a systems biology approach [19, 27]. There now exists 
detailed knowledge of the component reactions as a result of at least three 
decades of research on both the biochemical and mathematical modeling 
fronts. The identification of aggregates of metabolites (i.e., species) and reg- 
ulatory structures in the red blood cell has also received attention from a 
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systems biology perspective [19, 25]. This particular reaction network there- 
fore constitutes a suitable test-bed to establish the utility and applicability 
of our approach. In contrast to this work, [19] aims to identify "pools" of 
metabolites in the red blood cell, that is "aggregate groups of [species] which 
[. . . ] move together in a concerted manner," rather than groups that move 
independently given an appropriate conditioning set of species. 

The SKM studied is the one implied by the metabolic network of the red 
blood cell published in the open access Biomodels Database [23], which in 
turn is a slightly extended version of the kinetic model of [27] and [14] . The 
SKM consists of 38 reactions, with 45 different biochemical species in the 
species set V (the enzymes, i.e., catalysts, involved are omitted from V as 
they do not appear explicitly in the reaction mechanisms). Full details are 
available from [23] . The direction of the reactions is as for the kinetic model 
in Table 1 of [14], except for 8 additional reactions which are all included as 
dissociation reactions. It was verified that the SKM, henceforth SJCMrhc, is 
a standard SKM (according to Definition 4.3). The names of the biochemical 
species in V and the associated abbreviations used are given in Appendix 
C. For details of the reactions in M., the reader is referred to [23]. 

Figure 2 depicts the kinetic independence graph G for iS/CAlrbc- The 
graph is a powerful visual aid to understanding the architecture of the 
molecular network and can be preliminarily inspected for interesting local 
independences and separations in the undirected version C . The clique de- 
composition, Tc, from Algorithm 5.1 for SJCAij^hc has many clusters (20 out 
of 38) for which \ Sd is the empty set. It is therefore desirable to imple- 
ment Algorithm 5.1 with a substantial degree of pairwise cluster aggregation 
in step 5. On the other hand, Tmpd for this SKM is overly coarse-grained 
for most purposes. Figure 3 depicts a particular junction tree 7mod,i re- 
turned by Algorithm 5.1, with the choice of aggregations guided both by the 
structure of Tc itself and the goal of a modularisation that offers biological 
insight. This approach relies on and takes advantage of the flexibility offered 
by Algorithm 5.1 — some exploration of alternative modularizations by the 
user is required, but no prior information about possible modularizations is 
needed. 

The junction tree 7mod,i in Figure 3 is labelled as follows. The dth module 
(rectangle) is labeled with the species in the "residual module," Mci\Sd, and 
each edge, ~ Me, is labeled with the species in the separator Sde, that is 
the intersection of the modules connected by that edge. It was verified that, 
for ah d, Condition 4.3 holds for = A(Mrf \ Sd) n A(V \ M^) , as required 
by Theorem 5.3. Recall that Proposition 5.2 implies Sde = Kie H V^d and 
Vde -Lg~ Ved\Sde in G"'. Such separations may be conveniently read off from 
any junction tree Tmod since Sde, Vde and Ved are all immediately apparent 
from examination of the tree. Similarly, the defining conditional indepen- 
dencies of the modularization, namely J^^''-^^''- JL J-"/'^"^'' ''|J"j^'*;P Vd 
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Fig. 2. Kmetic independence graph (KIG) of SICMrhc, the Metabolic Network of the 
Human Red Blood Cell [23]. The KIG is constructed according to Definition 3.1. Full 
species names are given m Appendix G. 

(5.1), may be read off the junction tree using Theorem 5.3, being given 
by the union of the labels of all edges that connect with the dth module. 

Having obtained a modularization such as 7mod,1) the next stage is to 
ask what are the interesting features that emerge from a biochemical and 
systems biological perspective. Each of the main modules of 7mod,i turns 
out to contain like species, either in terms of their molecular structure (e.g., 
the groupings of monosaccharide-phosphate sugar molecules and phospho- 
glycerate molecules) or their function (e.g., the grouping of species involved 
in reduction-oxidation reactions), or both. 

Specific modules and their residuals are denoted by their first constituent 
species in the subsequent discussion. Consider first the central residual 
{NADPf , . . in Tmod,!, the residual of what will be termed the Redox 
module (for i^erfuction- oxidation). The red blood cell is subject to oxidative 
stress due to reactive oxygen species, which if left unchecked leads to cell 
lysis (bursting) and consequent anemia. All of this residual's species can 
be seen to play a role in the control of such oxidative stress. Glutathione 
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Gri23P2f, Mgf, MgGri23P2, PRPP. ATPf. Gri13P2, PEP. Gri3P. Gri2P, Phi. NAD, Phiex | | NADPf, GlcASP, GSSG, Pif, P2f' 



Glc6P, NADI I, Pyr, Rul5P 



P2NADPH, P2NADP, P1NADPH, P1NADP, GSH 



MgATP, Mg ,DP, MgAMP 



Lac, Pyrex, Lacex 



Fig. 3. Junction tree representation, 7mod,i, of a modularisation of iS/CA^rbc, the 
Metabolic Network of the Human Red Blood Cell [23]. The global dynamic independences 
jr^/jj j-Ue^d 't^-pSd fiold for each module Md (see Definition 5.1). The modules (rectan- 
gles) are labeled with their residuals and edges are labeled with the intersection of adjacent 
modules. Full species names are given in Appendix C. 



{GSH) acts as an antioxidant, scavenging reactive oxygen species and it- 
self being oxidised as a result (giving rise to the reaction 2GSH — )• GSSG). 
The cell must maintain adequate levels of GSH., which it does by producing 
large amounts of NADPH for use in the reduction of GSSG (by the reac- 
tion GSSG + NADPH 2 GSH + NADP). Production of NADPH is via 
2 reactions (usually described as the oxidative phase of the pentose phos- 
phate pathway), both of which involve GlcA6P. Both NADP and NADPH 
are also found bound to the proteins PI and P2. Notice that the reduced 
forms NADPH and NADH are both found in the module's separator (edge) 
with {Lac, Pyrex, Lacex}, since both influence the intensity of lactate {Lac) 
production and export as reactants for the reduction of pyruvate {Pyr). 

The module {NADPf, . . .} clearly has an important function in oxidative 
stress control and in reduction-oxidation reactions more generally within the 
red blood cell. Of course, these functions of its individual species are well 
known. That their dynamic evolution, together with that of lactate (Lac), 
is globally independent of all the other species in the network conditional 
on the internal history of {Fru6P, Glc6P, NADH, Pyr, Rul5P, Pyrex} is an 
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insight provided by the modularizations (see also the derivation of 7mod,2 
below). Assigning function(s) where possible to each module of a given mod- 
ularization, TmoD) is likely to improve both understanding of a network and 
ultimately aid attempts to control it. For reasons of space, comments related 
to the remaining two large residuals of 7mod,i may be found as part of the 
discussion of 7mod,2 below. 

The structure of 7mod,i encourages further aggregation in an obvious 
manner. A second modularization, 7mod,2i of 5/CA4i.bc is thus shown in 
Figure 4. [It was verified that, in this case also. Condition 4.3 holds for 
Tfi = A{Mj_\Sd) n A(V\M£;) yd.] 7mod,2 may be derived from TMOD.iin two 
steps. First, the modules {AMPf, ADPf}, {Lac, Pyrex, Lacex} and {Glcout} 
are aggregated with their adjacent modules in 7mod,i- Second, a small num- 
ber of species in residuals are then judiciously included also in an additional 
module so that they both fall instead in the relevant separator and the con- 
dition of Proposition 4.9 is satisfied for each partition [Md \ Sd, V \ Md, Sd\- 
By Proposition 4.9, this ensures that and N^*d are the same subprocess, 

whence {T^''} = {T^ } for d = 1, 2, 3. Clearly, the second step need only be 




Gri23P2f, Mgf, MgGri23P2, MgAMP, PRPP, ATPf, AMPf, Gri13P2, PEP. Gri3P, Gri2P, NAD, Phiex NADPHf, Lac. Lacex, NADPf. GlcA6P, GSSG. Pir, P2f. P2NADPH, P2NADP. P1NADPH, P1NADP, GSH 



Fig. 4. Junction tree representation, TLiod,2, a coarser-grained modularization of 
SICMrhc, the Metabolic Network of the Human Red Blood Cell [23]. The global dynamic 

independences J-f '"^ ^L t)^"*'^ " | J^/"* hold for each module Md (see Definition 5.1). The 
rectangles contain module residuals and edges are labeled with the intersection of adjacent 
modules. Full species names are given in Appendix C. 
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performed if it is desired to be able to replace hy J-^'' in the defin- 
ing conditional independencies of the modularization (Definition 5.1). The 
species involved in this case are {Fru6P , Frul6P2 , Phi, ADPf , Pyrex} and 
these therefore now appear in the edge labels (separators) of 7mod,2 rather 
than in the residuals. The proposition below establishes that the validity of 
the modularization remains unchanged by such an operation. 

Proposition 6.1. Suppose that {MalMa C V}, is a modularization ac- 
cording to Definition 5.1 of a standard SKM [N, S, P] , and that the modu- 
larization satisfies, for all d, the separation {M^} -Lg~ {[Je=id-^fi}\^d in the 
undirected KIG . Define a new collection of subsets {Md\Md C V} where 
Md = MdU {[Je^d'^ed} and, Ve / d, Ced C Me and Ced D Ma = {ced = 
being allowed). The species Ced are called those "copied from e to d. " 

Then {Md} J-g- {Ue^d-^eil'^'i ^'^ provided that Gondition 4-3 con- 
tinues to hold for td = A{Md \ Sd) D A(V \ Md) Vd, {Md\Md C V} is also a 
modularization of the SKM [N, S, P] . 

Proof. Clearly, Ud^d = V- By Corollary 4.6, it suffices to show that 
{Md} ±G~ {[Jej^d^^e}\^d Vd. Let td ■■= Ue^d^ed, the species copied to d, 
and fd ■= [Je^d'^de, the species copied from d. The separation {Md} -Lg~ 
{[Je^dMe}\Sd implies that {Md U td} ±g~ {{Je^d^e} U fd\{Sd UtdU fd}, 

which yields the required result since Sd = {Md U td} H [{Ue^d^e} ^ fd] — 
SdUtdUfdU0. □ 

There are 3 modules comprising 7mod,2 which together contain 45 dif- 
ferent species, of which 32 distinct species are found only in module residuals 
(and hence are found in exactly 1 residual). The redox module {NADPHf , . . .} 
has already been discussed above. The module {Glcin, . . .} has the largest 
intersection with the rest of the network and acts as a linking module; it 
will be termed the MPS [Monosaccharide-Phosphate Sugar module). The 
two modules {NADPHf, . . .} and {Gri23P2f , . . .}, by contrast, have only 2 
species in common, namely (NADH, Pyr) — these are the only species com- 
mon to all three modules. The MPS residual contains species that all be- 
long to a single chemical class of molecule, namely monosaccharide sugar 
molecules (mostly with phosphate groups attached), with a further 6 dif- 
ferent monosaccharide-phosphates (MPs) found in the rest of the module. 
Interestingly, the MPs of the module are those found in two "pathways" tra- 
ditionally discussed separately — the pentose phosphate and glycolytic path- 
ways. Indeed, the MPs {Glc6P, Fru6P, GraP) all participate in reactions 
found in both "pathways." 

The third and final module {Gri23P2f , . . .} will be termed the PGA 
{Phos- phoGly cerate- Adenosine) module, according to the chemical class of 
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some of its constituents. It contains all of the phosphoglycerate molecules in 
the species set V, namely {Gri23P2f, MgGri23P2f, Gril3P2, GriSP, Gri2P), 
together also with all of the adenosine phosphate molecules {ATPf, ADPf, 
AMPf) — both free and complexed with magnesium {Mg). The module also 
contains all of the species involved in reactions of the so-called "pay-off 
phase" of glycolysis whose function is the production of the high-energy 
compounds ATP and NADH. That the dynamic evolution of, for example, 
all phosphoglycerates together with PEP is globally independent of all the 
other species in V conditional on the internal history of {Frul6P2, MgATP, 
MgADP, GraP, NADH, Pyr, RibSP, ADPf, Phi) is again an insight provided 
by the modularization. 

The modularizations 7mod,i and 7mod,2 identified using the theory and 
methods developed in the paper constitute parsimonious, coarse-grained 
views of the metabolite network studied and provide important insight con- 
cerning the dynamics of the biological system as a whole. 

7. Directions for future research. Application of the methods developed 
here to SKMs with large species sets and many component reactions is of 
considerable interest. In ongoing research that examines biochemical sig- 
nalling networks with approximately 900 reactions and 750 species, the 
methods have been found to work effectively and to provide scientifically 
interesting modularizations. 

It would be useful to consider methods for testing the adequacy of an SKM 
(perhaps augmented to allow for measurement error) as a statistical model 
of a given cellular system. Testing conditional independence relationships 
implied by a modularization of the SKM (such as the one in Figure 4 for 
the red blood cell) offers a promising means of assessing model adequacy. 
Clearly, it is not necessary to measure experimentally all species in the SKM, 
but all species in the relevant conditioning set (separator) must be measured. 
Intuitively, with A -Lg~ B\D, changes in B — perhaps resulting from direct 
intervention on the levels of B — should be uninformative about changes in 
A over time intervals sufficiently short to ensure that levels of D usually 
remain constant (and vice versa). 

SKMs subject to interventions are likely to become an area of active 
research, given their relevance both to medical and biotechnological appli- 
cations. The predicted effect of interventions (e.g., gene knock-outs, RNA 
silencing, or receptor inhibition) could be derived by altering the specifica- 
tion of the SKM accordingly and comparing with the original SKM. There 
are also interesting connections with the causal inference literature more 
generally. Recently, Commenges and Gegout-Petit [2] introduced a "general 
dynamical model as a framework for causal interpretation," adopting an ap- 
proach to causality based on "physical laws in sufficiently large systems." 
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Local independence plays an important role in their analysis and defini- 
tion of influence. One might imagine that a sufficiently large SKM would 
be a candidate "perfect system" for a given smaller and observable cellular 
system. However, the jump processes followed by biochemical species and 
hence also SKMs do not belong to the class ("D) of special semimartingales 
to which [2] confines attention. Nevertheless, the approach seems relevant in 
broad terms. Finally, the experimental design of interventions to test causal 
claims derived from SKMs merits attention. 

APPENDIX A: PROOFS FOR GLOBAL DYNAMIC INDEPENDENCE 

A.l. Proof of Theorem 4.4. First we show that jr'^(^)^^{^) = T^^{t) C 
J-P* , in order to establish (A.l) below — that is, the internal history of all 
reactions that change A and B is contained in the internal history of A^^* (t). 

The separation A -Lg~ B\D implies that for any m € A(A) fl A(i?), R[m] C 
D [suppose not — then in the KIG G, either pa{B) n A / or pa{A) riB^0 
which contradicts the separation]. For any m G A(yl) n A{B), R[m] ^ 
and R*[m] 7^ by Definition 4.3(iii) and (iv); clearly i?*[m] C D. Hence 
m G A{D) and ADab = A{A) n A{B) (with the possibility ADab = not 
excluded). By Condition 4.3, any reaction in F = A(^) n A{B) changes D 
differently — that is, the partition A4{ADab) is either empty or consists of 
singletons — since Mm^rfi € F {m ^ rh), 5"" / and (5"")"^ = (5"")^ = 
because R*[m] C D (similarly for rh), hence (5*")^ 7^ i'^^)^ ^-^d 5^ ^ S^. 
Hence ivf {^)nA(B) ^ ^d^^^^ _^A{^)nA(B) ^ j,d^^^^ ^ -^p* ^ ^^^^^ 

implies immediately that 

(A.l) 7-f(^)xj-f(^)"^(^)|7-r;Pt. 
Together with 

(A.2) 7-f(^)xj-f(^)\^(^)|J-r;Pt 
(which is proved below) it follows that 

Since 7-f (^)n^(^) v F^' = T?' and J-f (^)\^(^) v J-f (^)n^(^) = J-f (^). It 
then follows that _IL Tf \J-['* ; Pt as required since it is clear from the def- 
inition of N^{t) and N'^{t) that C jf and C (The reader 
unfamiliar with conditional independence of cr-fields and its properties is 
referred to [7] — see, in the context of this proof. Theorem 2.2.1, Corollary 
2.2.4, Theorem 2.2.10 and Corollary 2.2.11 there.) 

It remains to establish (A.2). Under P, and hence also under P^, {J^\m = 

1, . . . , M} are independent cr-fields (see Lemma 4.2). It follows that T^^^^ _IL 
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j^A[B)\AiA)^j^ADD.p^ since [A{A), A{B) \ A{A), ADd] is a partition of {1, 



,M}. It now suffices for (A. 2) to show the existence of Q J^t 



A{A) ^ ^A(A) 



and (^)\^(^) C j-f (^)\^(^) , such that V (^)\^(^) v J-f = Tf' . 

Heuristically, we want to identify "information" contained only in T^^^^ and 
^A(_B)\A(yl) ^ respectively, which when jointly combined with gives the 

internal history of . But this corresponds exactly to the way NP* = 
[{N^{t),N^^{t)},Nj^{t),Ng{t)] was constructed. 

Recall Definition 4.2 for N^' (t); its history T^* is given by [T^{t) V 

J'ab (t) V J'E (t) V -^D (*)] • Since AD a QA{A), [t] C jf C jf ; sim- 
ilarly ADab C A{A) and hence /■i^B(t) = jf C jf and AZ)b C 

A{B) \ A{A) hence J^^it) C jf C . Note that Tg{t) = jf 

since A4{AD£)) is either empty or consists of singletons — any 2 reactions 
that change D alone must do so differently since no 2 columns of S are equal 

(by Definition 2.1). Finally, taking ^f^^^ = F^{t) y F^j^{t) c jf and 
QA(B)\AiA) ^ ^ jrHB)\A(A) ^^^^^^^^^ ^^^^^ ^-^^^ ^^^^^ gHA) ^ 

gAiB)\AiA) ^ j,ADn ^ ^D' ^ 

A. 2. Proof of Theorem 4.5. The proof is in 3 parts. (I) First show that 
Vm E M., either R[m] C AUD in which case Jg Xm{u) du is adapted to T^"^* , 

or i?[m] C 5 U -D in which case /g* \m,{u) du is adapted to F^^* . 

The separation A J-g- B\D implies that either R[m] A\J D or R[m\ C 
BU D. Suppose not, then B n R[m] ^ and A n i2[m] 7^ — arguing using 
(i) of Definition 4.3, either m G A{A) in which case i? n pa(74) 7^ 0, which 
contradicts the separation; or m G A{B) in which case Anpa(-B) 7^ 0, which 
also contradicts the separation. If S n R[m] ^ and A R 7^ 0, then 

m € AD^i is not possible — if m € ADd then the reactants that are changed 
i?*[m] C D and hence, by (iv) of Definition 4.3, either B n i?[m] 7^ or^ n 
R[m] 7^ but not both. 

Therefore, if R[m] CAuD (resp., R[m] Q B U D) then both Xm{t) and 
log(Am(i)) are measurable with respect to J^^-^"^^ c J'^'^^ C J"/^^* (resp., 
jtBud ^ jrB^3*) by (2.6), since X^H(t_) ig measurable '""^ and C 
J"/^*. Since Xm{t) is also caglad, if R[m] (lAuD (resp., /2[m] (IBUD) then 
Am(i) is .7^/^^ -predictable and hence J^Xm{u)du is -adapted (resp., 

Jf^* -adapted). 

(II) Second show that if R[m] C A U D (resp., R[m] <^ B U D), then 
{T^}s>i are J^f^^ -stopping times (resp., J-'f^ -stopping times). It will then 
follow that l(rj" < t) log{Xra{T^)) is -F^^* -measurable Vs > 1 (resp., T^^*- 
measurable) by the definition of T^S* , because log(Am(i)) is left continu- 

^ s 

ous and hence log(Am(T'J")) is J^^i?* -measurable — see, for example. Theo- 
rem 2.1.10 of [20]. This in turn yields that ^^^^ l(r™ < t) log(A,„(T™)) is 
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J-^ -measurable (resp., -measurable). To establish the required stop- 

ping time property for {TJ"}s>i, distinguish the following cases, exactly one 
of which must hold Vm € M : 

(i) m € ADd: recall that M{AD£)) consists of singletons since any 2 
reactions that change D alone must do so differently (by Definition 2.1). 
Therefore Nmit) is adapted to = T£{t) C jr/^*, hence {T™} s>i are 
7"/^* -stopping times. Either R[m] <^ A U D or Rim] C B U D [by part (I) 
above]. If R[m] C AUD (resp., R[m] C BUD), then {T^}s>i are necessarily 
J"/^^* -stopping times (resp., J"^^^* -stopping times), as required. 

(ii) m € AA n A{B): recall from the proof of Theorem 4.4 that R[m] C D 
and the partition M{ADab) consists of singletons. Hence J-J^ C jr^^^'^(^) _ 
J^Asi^) Nra{t) is adapted to F^* and {T™} s>i are J"/^* -stopping 
times. 

(iii) m G A(A) \ A{B): we have that R[m] AU D by part (I) above; 
consider the cases (iiia) m G AD a, and (iiib) m ^ AL*^ in turn below to 
conclude that in each case {T™}s>i are J-j^^* -stopping times. 

(iiia) Identify the element of M.{ADa) corresponding to changes in 
D equal to S^, M.e{ADA) say. Denote the corresponding element of N^{t) 
by N^^{t), which is clearly measurable J-^* , and the jump times of this 
univariate counting process by {T^^{s)}s>i- Thus {T^^{s)}s>i are J-'f^^ - 
stopping times. Now A4e{ADA) may not be a singleton, but we can write 

S>1 

since € Me{ADA) (m ^ m) s.t. S^^^ = (by Definition 2.1 and 

5"^ = = 0). Since X^u^(t) is right continuous and X^u^(t-) left contin- 
uous, and both are 7"/^^* -adapted since Ji^^^-adapted, [X^^^(T^g(s)) - 

X^'^'^{T^^^{s)-)] is 7'^^*(r^g(s))-measurable (by, e.g.. Theorem 2T.10 of 

[20]). Hence the summand is 7""^^* (t)-measurable Vs > 1, Nm{t) is adapted 
to T(^^* and {T]"}<(>i are 7"/^^* -stopping times. 

(iiib) Then m G A*(^) := A(/) \ (A(S) U A{p)). D_efine for any p- 
variate counting process (p > 1), the "ground process" -/V(t) := V^^^Nit) 
We may then write 

where N^*^^{f') is the number of reactions on [0,t] that change A alone 
[noting that A{A) n A{B) = ADab]- Hence N^*'-^\t) is measurable 7^^ V 
J'A{^)^^ABi^) ^ T"/^^*, and its jump times {T^ *^^^)},>i are 7"/^^* -stopping 
times. Also, 

N^{t) = i{t5(^) < t}i{x^(r5(^)) - x^(r5(^)-) = 5;^}, 

S>1 
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since M G A*(^) (m m) s.t. = (by Definition 2.1 and 5^^-^ = 
S?y^ = 0). Since X^{t) is right continuous and X'^{t—) left continuous, and 

both are J-^^*-adapted since J-/^-adapted, [X^(ri^*^^^) - X^(rf *^^^-)] 

is T^^* {T^ ^'^^)-measurable. Hence the summand is J-""^^* (t)-measurable 
Vs > 1, and {T™}s>i are J"^^* -stopping times. 

(iv) m € /S.{B)\A{A): we have that R[m] C SuD by part (I) above; argue 
as in (iii) with A in place of B and vice versa to conclude that {T™}s>i are 
J"^^^* -stopping times. 

(Ill) Combining parts (I) and (II) above establishes that if R\m] ^A\JD 
(resp., R[m\ C B U D) then Cm,t :=exp(lm(i)) is measurable T^^* (resp., 
F^^*). Then, in an obvious manner, grouping the Cm^t into 2 groups ac- 
cording to the forementioned measurability property and defining the ipiD*,t 
as the product within each group yields Ct = '4'AD*,t ■ ipBD*,t, where 'ipiD*,t 
is nonnegative and J-"^* V J-"/^* -measurable for i G {A,B}. 

APPENDIX B: ADDITIONAL PROOFS 

Proof of Lemma 2.1. It remains to establish that J"/^ = T^^ . First 
show that ^ T^"" . We have that = a{Z^l{T^ < ti); < u < s > 1) 
and X^{u) = X^{Q) + X]s>i^/l(^s^ — '^)^ which is therefore measurable 
F^. Second show that C F^^ . We have also that F^" = (t{1{Z^ = 
S^)l{Tf < u);0 < u < t, s > l,m S AA), hence it suffices to show that 
l{Zf = S^)1{T^ < u) is measurable F^^ . By its construction, {T^} are the 
jump times of the right-continuous jump process X^ . The filtration {F^^ } is 
right continuous. Hence, for s > 1, is an J-"^^"* -stopping time and X^{T^) 
is J'^^(T/)-measurable. Since = X^{Tf) - X^{T^_^) and F^^{T^^^) C 
F^^{T^), Z^ is also J'^^(T/)-measurable. Hence {1{Z^ = s£) = 1} n 
{l(r/ < n) = 1} € F^^{u) C F^^{t) by the de&iition of F^^[t^), and 
therefore 1(.^^ = S^)l{Tf < u) is measurable F^^ . □ 

Proof of Lemma 4.1. Let As := (dP/dP)|j-,^vJ^3, and £3 := (dP/dP)|j-3. 
Then it is straightforward to show that £43 = £[£123 V F^] and £3 = 
E[>Ci23|^'^], where E denotes expectation under P. Hence, £13 = V'i3E[V'23|-^^ V 
F'^] and £23 = ''/'23E[V'i3|-^^ V J^^] by the nonnegativity and measurability of 
the V'is- Since J'^VT'^X J-i|7-3;P, £[^23!-^^ V-F^] = E[V'2„3|-^^] by Definition 
4.1 and hence i2i3 = V'i3E['i/'23|-^^]- Similarly, £23 = '023E[V'i3|-^^]- Further- 
more, £3 = E[?/;i3| J^^]E['(/'23|-^^] by the nonnegativity and measurability of 
the Vi3 and since J"W J"^ _IL J^^ y jtS] jr3. Therefore, 

(B.l) -^123-^3 = £i3£23 

and, in particular, Ci2zCz = >Ci3>C23 on the event {£3 = E['i/'i3|.7^^]E['i/'23|-7^'^] > 
0}, whence F^ _IL F'^\F^] P by Theorem 2.2.14 of [7]. □ 
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Proof of Proposition 5.2. The proof is in 3 steps, according to the 
number of pairs of clusters aggregated under step 5 of Algorithm 5.1: (i) for 
the case where no pair of clusters is aggregated, and hence Tmod = Tc'-, (ii) 
for the case where exactly 1 pair of clusters is aggregated; (iii) for the case 
where more than 1 pair of clusters is aggregated. 

(i) Tc" is a junction tree representation of the clique decomposition of 
C^. For the proof of this case see the proof of Theorem 4.6 of [3]. 

(ii) Tmod is connected (as a consequence of Tc being connected), and 
has ((5 — 1) nodes and {5 — 2) edges (one less edge than Tc)] Tmod is therefore 
a tree, whence there is a unique path in Tmod between any pair (M^, Mg) of 
its clusters. It is straightforward (but somewhat tedious) to show that every 
cluster on this path must contain PI Mg since the corresponding path in 
Tc possesses this junction property [by (i) above]. Hence Tmod is a junction 
tree. It remains to prove that for any 2 adjacent clusters {Md,Me) in TmoDj 
we have n Me = Vde D Ved and Vde -Ley Ved\Sde ■ 

We will show (iia) that edges "in common" between Tc and Tmod — the 
{5 — 2) edges not removed by the cluster aggregation — carry the same label, 
that is, the intersection of the clusters joined by each such edge is unchanged; 
and (iib) that cutting any such edge in both Tc and Tmod results in pairs 
of subtrees whose clusters have identical unions in the two cases. The result 
then follows from (i) above. 

(iia) If both clusters, (M^, Mg) , joined by such an edge, are in Tc and 
Tmod the claim is obviously true. Consider then the case where M^, say, is 
the result of the aggregation of the cluster pair (M^, M^). Suppose, without 
loss of generality, that ~ Me in 7c- Now Sde = (Me n M„) U (Me n M/3). 
The edge joining M^ to Me in Tmod was formerly, in Tc, the edge ~ Mg, 
whence Sde = (Me H M^) since is on the path between Mp and Me in 
Tc and (Me H Mg) C Mq,. Thus, the intersection of the clusters joined by 
the edge is always the same in Tmod and Tc-, as claimed. 

(iib) Let the edge that is cut in both cases be M^ ~ Me [where it 
is understood that M^, say, may be equal to M^ in Tc and hence equal 
to {Ma U Mp) in Tmod]- It is required to show, using an obvious notation, 
that Vj^°° = V^^ and Ve^°° = V^^. It is well known that cutting an edge 
in any tree results in 2 disconnected subtrees. One of the 2 pairs of subtrees 
generated here must contain 2 identical subtrees. Suppose then, without loss 
of generality, that T^-^o^ = TS'^ , whence V^^^ = V^^. The subtrees T^^^ 
and T^'^ have the same clusters, except for the aggregation of the cluster 
pair {Ma,Mp) to form M^^ in T^qd- straightforward (but tedious) to 
show that, for 7 ^ {a, (3} and a cluster in 7^*^, 

[M^]r^e \ {M„ M^} = [A-T^lr^oD \ 

where [M]^- are the clusters that can be reached from cluster M by paths in a 
tree T ■ The subtrees T^'^ are themselves connected graphs, but disconnected 
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from the corresponding 7^ . Therefore the clusters of 7^ are given exactly 
by [-My]^de, where is any one of its clusters. It follows that 



as required. 

(iii) The proof is by induction on the number of cluster pairs, n say, that 
are aggregated. Parts (i) and (ii) above establish the proposition for n = 
and n = 1. Exactly the same mode of argument as the one used in (ii) above 
also establishes that if the proposition holds for n > 0, it must hold for 
(n + 1). This completes the proof. □ 

APPENDIX C: SPECIES NAMES FOR RED BLOOD CELL SKM 

AMPf = AMP (unbound); ADPf = ADP; ATPf = ATP; DHAP = 
Dihydroxyacetone phosphate; E^-P = Erythrose 4-phosphate; Fru6P = 
Fructose 6-phosphate; Frul6P2 = Fructose 1,6-phosphate; GlcA6P = 
Phospho-D-glucono-l,5-lactone; Glcin = Glucose (cytoplasmic); Glcout = 
External Glucose; Glc6P = Glucose 6-phosphate; GraP = Glyceraldehyde 
3-phosphate; Gril3P2 = 1,3-Bisphospho-D-glycerate; GriSP = 3-Phospho- 
D-glycerate; Gri23P2 = 2,3-Bisphospho-D-glycerate; Gri2P = 2-Phospho- 
D-glycerate; GSH = Reduced Glutathione; GSSG = Oxidized Glutathione; 
Lac = Lactate; Lacex = External Lactate; MgATP; MgADP; MgAMP; Mg; 
MgGri23P2; NADH; NADPf = NADP (unbound); NADPHf = NADPH; 
Plf = Proteinl; P2 = Protein2; PINADP = Proteinl bound NADP; 
PINADPH = Proteinl bound NADPH; P2NADP = Protein2 bound NADP; 
P2NADPH = Protein2 bound NADPH; PEP = Phosphoenolpyruvate; Phi = 
Phosphate; NAD; PRPP = Phosphoribosylpyrophosphate; Pyr = Pyruvate; 
Pyrex = External Pyruvate; RihSP = Ribose 5-phosphate; Rul5P = Ribulose 
5-phosphate; SedlP = Sedoheptulose 7-phosphate; Xul5P = Xylulose 
5-phosphate. 
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